Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

align from-less query semantics with SQL #5418

Merged
merged 2 commits into from
Nov 1, 2024
Merged

align from-less query semantics with SQL #5418

merged 2 commits into from
Nov 1, 2024

Conversation

mccanne
Copy link
Collaborator

@mccanne mccanne commented Nov 1, 2024

This commit changes the interpretation of a query that omits the from operator. Previously, a lake query would use the default "HEAD" parameter as an implied "from HEAD". This contradicts SQL which assumes a table with one empty row (like the implied yield null case).

The new assumption in a lake query for a from-less query is an implied null source so that a "select 'hello'" query will return a single 'hello' rather than a sequence based on the implied read from HEAD.

Similary, a from-less command-line query (i.e., super -c) with no file arguments also has an automatic null scan inserted.

As part of this change, we moved the logic for inserting the null scan (along with the logic for inserting DefaultScan of zio.Readers) into the semantic pass (instead of by lake/file compiler).

These changes mean that lake queries that read from a pool require an explicit "from" and there is no implied pool from HEAD. Because of this, we removed the HEAD references in the code paths that involve lake queries. HEAD is still used by the various "super db" commands to refer to a default pool/branch/commit.

Currently, the null scan does not follow the done protocol and restart. We will need to add this when we want from-less subqueries to restart.

This commit changes the interpretation of a query that omits the
from operator.  Previously, a lake query would use the default
"HEAD" parameter as an implied "from HEAD".  This contradicts
SQL which assumes a table with one empty row (like the implied
yield null case).

The new assumption in a lake query for a from-less query is
an implied null source so that a "select 'hello'" query will
return not a single 'hello' rather than a sequence based on the
implied read from HEAD.

Similary, a from-less command-line query (i.e., super -c)
with no file arguments also has an automatic null scan inserted.

As part of this change, we moved the logic for inserting the
null scan into the DAG (along with the logic for inserting
DefaultScan of zio.Readers) into the semantic pass.

These changes mean that lake queries that read from a pool require
an explicit "from" and there is no impleied pool from HEAD.
Because of this, we removed the HEAD references in the code paths
that involve lake queries.  HEAD is still used by the various
"super db" commands to refer to a default pool/branch/commit.

Currently, the null scan does not follow the done protocol
and restart.  We will need to add this when we want from-less
subqueries to restart.
@mccanne mccanne requested a review from a team November 1, 2024 21:14
return errors.New("internal error: AST seq cannot be empty")
return errors.New("delete where command requires an expression")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The parser shouldn't know about the "delete where" command so I'd leave this error string as is.

@@ -39,6 +43,9 @@ func ParseQuery(query string, filenames ...string) (*AST, error) {
if err != nil {
return nil, err
}
if files.Text == "" {
return &AST{seq: []ast.Op{}, files: files}, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
return &AST{seq: []ast.Op{}, files: files}, nil
return &AST{files: files}, nil

} else if extInput {
seq.Prepend(&dag.DefaultScan{Kind: "DefaultScan"})
} else {
// This is a local query and there's not external input
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// This is a local query and there's not external input
// This is a local query and there's no external input

@mccanne mccanne merged commit f1a996d into main Nov 1, 2024
4 checks passed
@mccanne mccanne deleted the pool-from branch November 1, 2024 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants